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This paper describes Python, ROOT, Iguana and browser-based clients for the Clarens web services framework. 
Back-end services provided include file access, proxy escrow, virtual organization management, Storage Resource 
Broker access, job execution, and a relational database analysis interface. 
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1. Introduction 

The Clarens web services platform, described in a 
companion article, ^(| acts as a go-between for dis- 
tributed clients on the wide area network to access 
services using the widely implemented XML-RPC and 
SOAP data serialization standards on top of the lower- 
level HTTP protocol. Its usefulness must ultimately 
be measured by the usefulness of the services and 
clients themselves, however. 

This paper is divided into two parts describing the 
currently implemented services as well as clients tak- 
ing advantage of these services. 



2. Services 

A rather terse overview of the current Clarens ser- 
vices are given below. The reader is referred to the 
Clarens web page Q for full documentation of all 
methods and modules. 



2.1. File access 

Accessing files on a remote machine remains the 
most useful service any middleware product can pro- 
vide. This is evident from the more than 40 million 
web servers that are deployed worldwide 0- Although 
the percentage of static files served is hard to gauge, 
the SPECweb99 0] benchmark puts this number at 
70% of requests. 

Clarens serves files in two different ways: in re- 
sponse to standard HTTP GET requests, as well as 
via a file.readO service method. A virtual server 
root directory can be defined for each of the above via 
the server configuration file which may be any direc- 
tory on the server system. 

The file.readO method takes a filename, an off- 
set and the number of bytes to return to the client. 
Error message are returned as serialized RPC re- 
sponses. Network I/O is handed off to the web 
server, which uses the zero-copy sendfileO system 
call where available to minimize CPU usage and in- 
crease throughput. 



directory information, and file.md5() to obtain a 
hash value for checking file integrity. 



2.2. Proxy escrow 

Despite the use of asymmetric key cryptography, 
key management remains a problem, with private keys 
and certificates (credentials) having to be present on 
the client system. In the case where the same creden- 
tials must be used from different places, e.g. a person's 
desktop, laptop and other computer systems, this is 
inconvenient, as well as degrading the security of the 
credentials. 

In analogy to the MyProxy [|| project, Clarens of- 
fers the ability to store and retrieve short-lived, self- 
signed certificates (called proxy certificates in Grid- 
oriented literature). 

The RPC API provides the methods 
proxy . store () , proxy . retrieve () , and 

proxy . delete () for managing stored proxy cer- 
tificates. Combinations of private key/certificate 
pairs or proxy /certificate pairs may be stored in 
this way, with the certificate distinguished name 
(DN) acting as a unique identifier. The credential 
information is stored in encrypted form using a 
symmetric cipher using a password provided when 
invoking the store method. The password itself is 
not stored, for obvious reasons. 

From the above it should be obvious that this ap- 
proach presents a chicken-and-egg problem of having 
to present credentials in order to obtain credentials. 
This may be solved in two slightly different ways. 
Firstly, a web portal may be constructed that takes 
input from the user and acts as an intermediary to 
log into the Clarens server in question and retrieve 
the credentials. This can be done with the portal re- 
siding on an arbitrary machine. Secondly, the ability 
of Clarens to respond to HTTP GET requests (i.e. act 
as as simple web server) may be used to construct such 
a portal on the Clarens server itself, which has access 
to the server methods as an unprivileged user. These 
web portals may also be accessed programmatically 
from within programs or scripts. 
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Table I VO management using the group module. 



Method 


Description 


create 


Create a new group 


delete 


Remove a group 


add_users 


Add members to a group 


add_admins 


Add administrators to a group 


users 


Lists the group members 


admins 


Lists the group admins 



Table II ACL management using the system module. 



Method 


Description 


add_acl_allow 

add_acl_deny 

add_acl_spec 
del_acl_spec 
get_acl_spec 


Adds users and groups to the allow 
list of a method 

Adds users and groups to the deny 
list 

Create a new ACL 
Delete an ACL 
Lst ACLs 



may be perceived as less secure than keeping the cre- 
dentials on possibly multiple systems' as files or in 
web browsers. 

2.3. Virtual organization management 

The Clarens authentication architecture ^(| is built 
around the concept of a hierarchical virtual organiza- 
tion (VO) of groups and subgroups of individuals iden- 
tified by unique distinguished names (DNs). These 
individuals may be both people or servers. 

To ease administration of the VO, a set of methods 
is provided in the group module to create, delete, and 
list groups and their members and administrators. 

The most important of these methods are described 
in Table IJ 

In addition to managing the VO structure, the 
group module also provide methods to store, retrieve 
and search for certificates. Certificate Authority cer- 
tificates may similarly be searched and retrieved, but 
not stored, since these certificates are used to ensure 
the uniqueness of client distinguished names. Instead, 
CA certificates are managed by the system adminis- 
trator in the form of files stored in a designated direc- 
tory. 

2.4. Access control management 

The Clarens authorization architecture is built 
around the concept of hierarchical access control lists 
for RPC methods [16(. To administer these ACLs, 
methods are provided to create and modify lists 
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2.5. Storage Resource Broker access 

The SDSC Storage Resource Broker (SRB) provides 
a uniform interface to external applications to access 
various storage media, including local and remote file 
systems, and tape storage. It is also a client-server 
system, with Clarens acting as an SRB client on behalf 
of its own clients. The Clarens SRB API provides 
methods to initialize a connection to the SRB server 
and browse, store, and retrieve files. 

This interface is not currently deployed, however, 
since it exposed a critical impedance mismatch be- 
tween these two client-server systems: Clarens uses 
an entirely stateless connection protocol, while SRB 
uses a stateful protocol. This means that in the cur- 
rent implementation a new SRB connection must be 
initiated upon each method invocation by the Clarens 
client, which results in very poor performance. 

Work is underway to remedy this mismatch by uti- 
lizing Clarens agents that can hold persistent connec- 
tions to SRB on behalf of clients. Another approach 
being considered is to implement a stateful protocol 
interface to Clarens itself. 



2.6. Job execution 

Next to file access, the ability to execute jobs on a 
remote machine remains one of the cornerstones of dis- 
tributed computing. Using the Clarens distinguished 
name to system user mapping [l6| , system commands 
can be executed by remote users. 

As is common practice, Clarens makes use of a small 
compiled program, called suexec to change the per- 
missions of the resulting process. This program can be 
more easily audited for security than the entire depen- 
dency chain of the main Clarens code. Upon receipt 
of a shell command, a directory owned by the remote 
user is created where the command's output and error 
messages are stored. The working directory of the re- 
sultant process is also changed to this directory, and 
any newly created files may be accessed remotely us- 
ing a job ID which is managed by Clarens. 

It should be pointed out that this interface is not 
designed to schedule jobs on Clarens servers, but is 
most useful for handing jobs to the schedulers (e.g. , 
|jj) themselves, and retrieving the resultant output 
files. 

This interface is currently used to develop inter- 
active remote analysis using the CMS ORCA anal- 
ysis package, where analysis jobs themselves become 
Clarens servers (albeit less featureful ones) that can 
act as personal remote application servers to allow in- 
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2.7. Relational database analysis 
interface 

An interface to the Stl based Object Caching And 
Transport System (SOCATS) being developed by Cal- 
tech, allows remote users to query large Physics 
datasets using standard relational database (SQL) 
queries. 

SOCATS results are returned in the form of a ob- 
ject file formatted as a ROOT [llj tree, which can be 
retrieved by the above-mentioned file access methods. 
A ROOT interface to Clarens is also described below, 
which was successfully demonstrated at the 2002 Su- 
percomputing conference. 



3. Clients 

One of the express aims of the Clarens architecture 
is to use widely deployed interfaces to lower the im- 
plementation barrier for new client applications. The 
availability of HTTP, SSL and SOAP/XML-RPC im- 
plementations on most platforms and programmings 
languages helps us achieve this goal with the minimum 
of new code. 



3.1. Python 

Python is a weakly typed, object oriented 
scripting language. Its programs are compiled to 
a platform-independent byte-code, similar in many 
ways to Java. It's built-in support for both HTTP, 
SSL, and XML-RPC, combined with the rapid proto- 
typing abilities inherent in a scripting language made 
it natural to be used as the default client-side devel- 
opment language. 

The Python Clarens client is implemented as a pure 
Python class, called clarens_client that takes an ar- 
gument for the server URL, and optionally the certifi- 
cate and private key files to be used in the connection 
to its constructor method. The constructor method 
initiates a connection with the server an authenticated 
the user using the credentials stored in the standard 
places by the Globus toolkit, including proxy creden- 
tials if they exist, or those provided to the constructor 
method. 

The clarens_client object maps all non-local 
method calls to remote procedure calls, handling se- 
rializing/deserializing of the method arguments and 
return values transparently. The following example of 
the echo . echo () method demonstrates this: 

>>> ob j =clarens_client ( "http : //server . org/ " ) 

^ „-u.: t t f UTT—n 1 - II "\ 



I.e. the echo. echo () remote method is invoked di- 
rectly from the command line, and the result is re- 
turned as a native Python string to the caller. The 
result is always returned as a list, indicated by the 
square brackets. 

3.2. ROOT 

The obj ect-oriented modular architecture of the 
ROOT analysis package, combined with its rich 
set of built-in objects and wide adoption in the high 
energy physics community makes it a very useful client 
for remote analysis functionality. 

In analogy to the Python client , Clarens ROOT 
client handles authentication transparently, and pro- 
vides the user with a Clarens object to communicate 
wit the remote server. This client does not do auto- 
matic serialization/deserialization of arguments and 
return values to native ROOT objects, a lower level 
interface must be used instead for general remote pro- 
cedure calls. 

It does, however, provide a convenient interface 
to read remote files using the Clarens server, via a 
TCWebFile object derived from the native TFile ob- 
ject. This object provides all the functionality of the 
local version, allowing transparent analysis of remote 
files from the ROOT command line, scripts or com- 
piled code by changing only the object type. Using 
the interactive ROOT object browser it is possible to 
browse the structure and content of remote ROOT 
files quickly and easily, with the ability to display his- 
tograms and other types of plots contained in the re- 
mote files interactively or programmatically. 

The TFile base class supports a dynamic local 
caching mechanism that is used to optimize file trans- 
fers, so that the extremes of transferring the whole 
remote file to the local client, as well as making a re- 
mote procedure call for small parts of the file can be 
avoided, striking a balance between bandwidth and 
latency constraints. 

Another convenience class is the 
TCSystemDirectory, derived from the 
TSystemDirectory base class. This class provides 
an interface to the directory browsing functionality 
in the Clarens file module, allowing the client to 
interactively traverse the remote directory structure 
using the ROOT object browser. Any remote ROOT 
files may be opened by clicking on their icons in the 
browser. The TCSystemDirectory class may also 
be used programmatically from ROOT scripts or 
compiled CH — h programs. 

3.3. IGUANA 

IGUANA is an interactive visualization toolkit, 
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support for accessing remote Openlnventor-formatted 
files via Clarens for local display and manipulation. 

3.4. Browser interface 

The web browser is currently the most widely 
used distributed computing tool, bar none. Mod- 
ern browsers have native support for SSL-encrypted 
connections and client-side certificate authentication, 
making it an ideal platform for a Clarens interface. 

After initial experiments with client-side Java ap- 
plets for communicating with the server, it was de- 
cided to use the Javascript language embedded in 
most browsers to handle this task. Since Clarens is 
able to serve web pages in response to HTTP GET 
requests, the browser interface is implemented as a se- 
ries of static web pages that embed Javascript scripts 
to handle communication and interface display using 
dynamic HTML. This implementation eliminates the 
need for clients to install any additional software apart 
from a web browser, which most people already have. 

The browser interface uses XML-RPC for data 
serialization since it is by far simpler than the 
more complex SOAP protocol. As with the 
Python interface, argument and return value serial- 
ization/deserialization is handled transparently by the 
provided Javascript libraries, made easy by the object 
oriented, loosely typed nature of the language. 

Functionality currently provided include browsing 
a remote file repository, with the ability to download 
files, and virtual organization management. 



4. Conclusion 

Clarens provides a growing list of services and useful 
client implementations for doing distributed comput- 
ing in a Grid-based environment. 
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